Theano

A language in a language

Dealing with weights matrices and gradients can be tricky and sometimes not trivial. Theano is a great framework for handling vectors, matrices and high dimensional tensor algebra. Most of this tutorial will refer to Theano however TensorFlow is another great framework capable of providing an incredible abstraction for complex algebra. More on TensorFlow in the next chapters.



In [1]:

    
import theano
import theano.tensor as T

Symbolic variables

Theano has it's own variables and functions, defined the following



In [2]:

    
x = T.scalar()



In [3]:

    
x









    Out[3]:





<TensorType(float64, scalar)>

Variables can be used in expressions



In [4]:

    
y = 3*(x**2) + 1

y is an expression now

Result is symbolic as well



In [5]:

    
type(y)
y.shape









    Out[5]:





Shape.0

printing

As we are about to see, normal printing isn't the best when it comes to theano



In [6]:

    
print(y)









    



Elemwise{add,no_inplace}.0



In [7]:

    
theano.pprint(y)









    Out[7]:





'((TensorConstant{3} * (<TensorType(float64, scalar)> ** TensorConstant{2})) + TensorConstant{1})'



In [8]:

    
theano.printing.debugprint(y)









    



Elemwise{add,no_inplace} [id A] ''   
 |Elemwise{mul,no_inplace} [id B] ''   
 | |TensorConstant{3} [id C]
 | |Elemwise{pow,no_inplace} [id D] ''   
 |   |<TensorType(float64, scalar)> [id E]
 |   |TensorConstant{2} [id F]
 |TensorConstant{1} [id G]

Evaluating expressions

Supply a dict mapping variables to values



In [9]:

    
y.eval({x: 2})









    Out[9]:





array(13.0)

Or compile a function



In [10]:

    
f = theano.function([x], y)



In [11]:

    
f(2)









    Out[11]:





array(13.0)

Other tensor types



In [12]:

    
X = T.vector()
X = T.matrix()
X = T.tensor3()
X = T.tensor4()

Automatic differention

Gradients are free!



In [13]:

    
x = T.scalar()
y = T.log(x)



In [14]:

    
gradient = T.grad(y, x)
print(gradient)
print(gradient.eval({x: 2}))
print((2 * gradient))









    



Elemwise{true_div}.0
0.5
Elemwise{mul,no_inplace}.0

Shared Variables

Symbolic + Storage



In [15]:

    
import numpy as np
x = theano.shared(np.zeros((2, 3), dtype=theano.config.floatX))



In [16]:

    
x









    Out[16]:





<TensorType(float64, matrix)>

We can get and set the variable's value



In [17]:

    
values = x.get_value()
print(values.shape)
print(values)









    



(2, 3)
[[ 0.  0.  0.]
 [ 0.  0.  0.]]



In [18]:

    
x.set_value(values)

Shared variables can be used in expressions as well



In [19]:

    
(x + 2) ** 2









    Out[19]:





Elemwise{pow,no_inplace}.0

Their value is used as input when evaluating



In [20]:

    
((x + 2) ** 2).eval()









    Out[20]:





array([[ 4.,  4.,  4.],
       [ 4.,  4.,  4.]])



In [21]:

    
theano.function([], (x + 2) ** 2)()









    Out[21]:





array([[ 4.,  4.,  4.],
       [ 4.,  4.,  4.]])

Updates

Store results of function evalution
dict mapping shared variables to new values



In [22]:

    
count = theano.shared(0)
new_count = count + 1
updates = {count: new_count}

f = theano.function([], count, updates=updates)



In [23]:

    
f()









    Out[23]:





array(0)



In [24]:

    
f()









    Out[24]:





array(1)



In [25]:

    
f()









    Out[25]:





array(2)

Warming up! Logistic Regression



In [26]:

    
%matplotlib inline



In [27]:

    
import numpy as np
import theano
import theano.tensor as T
import matplotlib.pyplot as plt

Kaggle Challenge Data

The Otto Group is one of the world’s biggest e-commerce companies, A consistent analysis of the performance of products is crucial. However, due to diverse global infrastructure, many identical products get classified differently. For this competition, we have provided a dataset with 93 features for more than 200,000 products. The objective is to build a predictive model which is able to distinguish between our main product categories. Each row corresponds to a single product. There are a total of 93 numerical features, which represent counts of different events. All features have been obfuscated and will not be defined any further.

https://www.kaggle.com/c/otto-group-product-classification-challenge/data

For this section we will use the Kaggle Otto Group Challenge Data. You will find these data in

../data/kaggle_ottogroup/ folder.

Note We already used this dataset in the 1.2 Introduction - Tensorflow notebook, as well as 1.3 Introduction - Keras notebook.



In [28]:

    
import os
import sys
nb_dir = os.path.abspath('..')
if nb_dir not in sys.path:
    sys.path.append(nb_dir)



In [29]:

    
from kaggle_data import load_data, preprocess_data, preprocess_labels









    



Using TensorFlow backend.



In [30]:

    
print("Loading data...")
X, labels = load_data('../data/kaggle_ottogroup/train.csv', train=True)
X, scaler = preprocess_data(X)
Y, encoder = preprocess_labels(labels)


X_test, ids = load_data('../data/kaggle_ottogroup/test.csv', train=False)
X_test, ids = X_test[:1000], ids[:1000]

#Plotting the data
print(X_test[:1])

X_test, _ = preprocess_data(X_test, scaler)

nb_classes = Y.shape[1]
print(nb_classes, 'classes')

dims = X.shape[1]
print(dims, 'dims')









    



Loading data...
[[  0.   0.   0.   0.   0.   0.   0.   0.   0.   3.   0.   0.   0.   3.
    2.   1.   0.   0.   0.   0.   0.   0.   0.   5.   3.   1.   1.   0.
    0.   0.   0.   0.   1.   0.   0.   1.   0.   1.   0.   1.   0.   0.
    0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
    0.   0.   0.   0.   0.   0.   0.   3.   0.   0.   0.   0.   1.   1.
    0.   1.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
    0.  11.   1.  20.   0.   0.   0.   0.   0.]]
9 classes
93 dims

Now lets create and train a logistic regression model.

Hands On - Logistic Regression



In [31]:

    
#Based on example from DeepLearning.net
rng = np.random
N = 400
feats = 93
training_steps = 10

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following sections of this tutorial
                                          # See: Intro to tf & Keras)

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)),
          allow_input_downcast=True)
predict = theano.function(inputs=[x], outputs=prediction, allow_input_downcast=True)

#Transform for class1
y_class1 = []
for i in Y:
    y_class1.append(i[0])
y_class1 = np.array(y_class1)

# Train
for i in range(training_steps):
    print('Epoch %s' % (i+1,))
    pred, err = train(X, y_class1)

print("target values for Data:")
print(y_class1)
print("prediction on training set:")
print(predict(X))









    



Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
target values for Data:
[ 0.  0.  0. ...,  0.  0.  0.]
prediction on training set:
[False False  True ..., False False False]



In [ ]: